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Abstract 



We extend the log-mean linear parameterization introduced by Roverato et al. ( 2013 ) 



for binary data to discrete variables with arbitrary number of levels, and show that also 
in this case it can be used to parameterize bi-directed graph models. Furthermore, we 
show that the log-mean linear parameterization allows one to simultaneously represent 
marginal independencies among variables and marginal independencies that only appear 
when certain levels are collapsed into a single one. We illustrate the application of 
this property by means of an example based on genetic association studies involving 
single-nucleotide polymorphisms. More generally, this feature provides a natural way 
to reduce the parameter count, while preserving the independence structure, by means 
of substantive constraints that give additional insight into the association structure of 
the variables. 

Keywords: Contingency table; Graphical Markov model; Marginal independence; Parsimo- 
nious model; Single-nucleotide polymorphism. 

1 Introduction 



Graphical models of marginal independence use a graph where every vertex is associated 
with a variable and missing edges encode marginal independence relationships according 



to a given Markov property; see Pearl and Wermuth (1994); Kauermann (1996); Banerjee 



and Richardson (2003); Richardson (2003). These models were introduced by Cox and 



Wermuth (1993 1996) as covariance graphs, with dashed lines to represent edges in the 



graph. More recently, lines with two arrowheads are often used in place of dashed edges 
and, accordingly, these models are also referred to as bi-directed graph models. Graphical 
models of marginal independence have appeared in several applied contexts as described in 



Drton and Richardson (2008) and references therein. Their application is typically suggested 



when the observed variables are jointly affected by unobserved variables (see, among others, 



Richardson 2003 Maathuis et al. 2009 Colombo et al. 2012) and, furthermore, they are a 
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special case of acyclic directed mixed graphs (Richardson 2003) and can be regarded as the 



building blocks of regression graph models (Drton 2009 Wermuth and Sadeghi, 2012). 



The probability distribution of a set of discrete variables is characterized by the as- 
sociated probability table, but defining a suitable parameterization for bi-directed graph 



models is not straightforward; see Drton and Richardson (2008); Drton (2009); Lupparelli 



et al. (2009); Roverato et al. (2013). A basic requirement for the flexible implementation 



of marginal constraints is that interaction terms involving a subset of variables satisfy up- 
ward compatibility, that is they should reflect a property of the corresponding marginal 



distribution; see Drton and Richardson (2008) for details and extensive references. Upward 



compatibility means invariance with respect to marginalization but, for discrete variables 
with arbitrary number of levels, a stronger invariance property may be required. As shown 
in the example below, there are situations where the research question involves different 
collapsed versions of a same variable. Collapsing two or more levels of a discrete variables 
into a single level can be regarded special kind of marginalization and invariance 

with respect to this operation is an useful feature for a parameterization. 

Example 1.1 (Genetic association analysis). Genetic association studies aim at identifying 



genetic factors associated with a certain phenotype, such as a disease; see Balding (2006) 



for a review of statistical approaches to population association studies. Single-Nucleotide 
Polymorphisms (SNPs) are the most common form of variation in the human genome (see 



Hirschhorn and Daly 2005). A SNP is a change in one nucleotide at a given genomic 



position. Commonly, SNPs are diallelic with two of the four bases A (adenine), C (cytosine), 
T (thymine) and G (guanine) occurring at the considered locus where it is possible that a 
wild allele, W , is substituted by a mutant allele, M. Hence, a SNP has three possible 
genotypes, WW, WM and MM, and can be represented as a three-level discrete variable. 
The latter representation makes it possible to identify the codominant genotype effect of the 
SNP on the phenotype, but the relevant phenotype may also be associated with alternative 
representations of the SNP. In the dominant genotype model, heterozygote individuals are 
expected to have the same phenotype as MM homozygote individual so that the levels 
WM and WM are collapsed into a single one to give WM + MM vs. WW. Conversely, 
in the recessive genotype model the three levels are dichotomized as WM + WW vs. MM. 
In general, none of these three genetic models for a specific SNP is favored a priori, and 
the research question also concerns the identification of the most appropriate representation 
of a SNP. Clearly, there is a loss of efficiency in fitting a different statistical model for 
every possible genetic model, and this is especially true when more SNPs are simultaneously 
considered. 

In this paper we extend the Log-Mean Linear (LML) parameterization introduced by 



Roverato et al. (2013) for binary data to discrete variables with arbitrary number of levels 



and show that also in this case it can be used to parameterize bi-directed graph models. 
Furthermore, we show that the LML parameterization satisfies a stronger version of up- 
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ward compatibility that we call dichotomization invariance. Every LML parameter can be 
uniquely associated with a cell either of the cross-classified table or of a marginal table and 
it is invariant with respect to both marginalization and collapsing operations that does not 
involve such a cell. In this way, the LML parameterization allows one to simultaneously 
represent marginal independencies among variables and marginal independencies that only 
appear when certain levels are collapsed into a single one. This feature is useful in several 
applied contexts, but it also provides a natural way to reduce the parameter count, while 
preserving the independence structure, by means of substantive constraints that give addi- 
tional insight into the dependence structure of variables. Being able to implement additional 
substantive constraints so as to specify parsimonious submodels is a key issue in graphical 



modelling and, more generally, in multivariate analysis (see, for instance, H0jsgaard and 



Lauritzen 2008) but it is especially relevant in marginal modelling because the number 



of parameters in a bi-directed graph model can be relatively large even for sparse graphs 
(Richardson, 2009; Evans and Richardson 2012[ ). 

This paper is organized as follows. Sections [2] and [3] provide a review of the theory of 
discrete bi-directed graph models and of the associated parameterizations as required for 
this paper. Section [4] contains the extension of the LML parameterization to the general 
discrete case, whereas in Section [5] we introduce the binary expansion operation and state 
the connected set Markov property for S-expanded graphs. Finally, Section [6] contains a 
brief discussion. 



2 Bi-directed graph models 

Let Yy = (Y v ) V £y be a random vector with entries indexed by a finite set V. In graphical 
models of marginal independence every variable Y v , with v £ V, is associated with a vertex 
of a bi-directed graph Q = (V, E). The edge set E is a collection of unordered, distinct, pairs 
of vertices and every edge {i,j} £ E is represented as a line with two arrowheads, i <-> j. 
A graph is complete if every pair of vertices is joined by an edge. A subset ^ U C V 
induces a subgraph Qjj = (U, Efj) where Ejj = E Pi (U x U). If Qjj is disconnected we say 
that U is a disconnected set of Q and denote by Ci, . . . ,C r its inclusion maximal connected 



sets that we call the connected components of U\ see Richardson (2003) for details. Recall 
that U = C\ U • • • U C r where the symbol U denotes a union of disjoint sets. 

A bi-directed graph model is the family of probability distributions for Yy that satisfy a 
given Markov property with respect to a bi-directed graph Q = (V,E). The distribution of 
Yy is said to satisfy the pairwise Markov property with respect to Q if for every {i,j} $ E, 
with i 7^ j, it holds that Y{ is independent of Yj; in symbols YjALY^. The distribution of 
Yy satisfies the global Markov property, also called the connected set Markov property by 



Richardson (2003), if for every disconnected set U of Q, the subvectors corresponding to its 
connected components Yc t , • • • , Yc r are mutually independent; Yc^-LL • • • ALYc r - We remark 
that the connected set Markov property implies the pairwise Markov property, whereas the 
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Figure 1: Bi-directed 4-chain. 
converse is not true in general, even for strictly positive distributions. 

Example 2.1 (bi-directed 4-chain). In the bi-directed graph of Figure [T] the disconnected 
sets are {1,3}, {1,4}, {2,4}, {1,2,4} and {1,3,4}. Under the connected set Markov prop- 
erty, the sets {1,2,4} and {1,3,4} encode the independencies X{ lj2 }-LLA^4 and XxALXr^y, 
respectively, and these imply the independencies encoded by each of the remaining discon- 
nected sets. 



3 Parametrizations of discrete bi-directed graph models 

3.1 The Mobius and LML parameters for the binary case 

In order to highlight that a variable is binary we denote it by X and, without loss of 
generality, we assume that X G {0, 1} so that Xy = (X v ) v ^y is a multivariate Bernoulli 
random vector taking values in {0, 1} P . From the fact that {0, 1} P = {(lui®v\u) I U — 
it follows that one can write the probability distribution of Xy as a vector tt = (nu)ucv 
with entries ttu = P(Xu = lu,X v \u = V \ V ). 

The multivariate Bernoulli distribution belongs to the natural exponential family with 
mean parameter fi = (fJ-u)ucv where 

/U0 = 1 and [Au = pr(Xu = ljj) for every f/C^ with U ^ 0. (1) 



The mean parameter fi was called the Mobius parameter by Drton and Richardson (2008) 



because = Y1ecv\U n UuE for every U C V so that the inverse map /x i— >■ tt can be com- 
puted by Mobius inversion as tvjj = ^2ecv\u(~ ^ E ^Uue f° r every U C V; see 



Lauritzen 



(1996 Appendix A). Let Z and M be two (2 P x 2 P ) matrices with entries indexed by the 
subsets ofVxV and given by Z U>H = 1(U C H) and M UyH = (-\)\ H \ U \ \{JJ C H), respec- 
tively, where l(-) denotes the indicator function. Then, one can write the linear relationship 
between n and n in matrix form as \x = Zn and it = Mf/, and Mobius inversion follows by 
noticing that M = Z^ 1 . 



Roverato et al. (2013) introduced the Log-Mean Linear (LML) parameter 7 = ('yu)ucv 



whose entries are computed as a log-linear expansion of the Mobius parameters 

1U = (-1) |C/V?I log for every UCV (2) 

ECU 

so that, in matrix form, 7 = M T log/i where M T denotes the transpose of M. Note that 
70 = and, furthermore, that tt can be analytically computed by applying Mobius inversion 
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twice to obtain ir = MexpZ T 7. The LML parameter can be regarded as a linearization of 
the Mobius parameter because certain multiplicative constraints in [i correspond to LML 
interactions equal to zero as follows. 



Theorem 3.1 (Theorem 1 in Roverato et al. (2013)). Let Xy be a vector of binary vari- 
ables with Mobius and LML parameters fi and 7, respectively. Then, for a pair of disjoint, 
nonempty, proper subsets A and B of V , the following conditions are equivalent: 

(1) X A ALX B ; 

(ii) Ha'uB' = l 1 A' x f^B' f or every A' C A and B' C B; 

(Hi) "fA'vjB 1 = for every A 1 C A and B' C B such that A' 7^ and B' 7^ 0. 



The equivalence (i)-O-(ii) of Theorem 3.1 follows immediately from Theorem 1 of Drton 



and Richardson (2008) where it is shown that, in the binary case, bi-directed graph models 



can be parameterized by imposing multiplicative constraints on the Mobius parameters. The 
latter result can be expressed in terms of the LML parameters as follows. 



Theorem 3.2 (Theorem 2 in Roverato et al. (2013)). Let 7 = (ju)ucv be the LML pa- 
rameter of the binary random vector Xy and let Q = (V, E) be a bi-directed graph. The 
distribution of Xy satisfies the connected set Markov property with respect to Q if and only 
if for every set U C V that is disconnected in Q it holds that ju = 0. 

Example 3.1 (bi-directed 4- chain cont.). The disconnected sets of the bi-directed graph of 
Figure [T] are {1,3}, {1,4}, {2,4}, {1,2,4} and {1,3,4} so that Xy satisfies the connected 
set Markov property with respect to such graph if and only if 7{i,3} = 7{i,4} = 7{2,4} = 
7{1,2,4} = 7{1,3,4} = 0. 



Theorem 3.1 and 3.2 are proved in Roverato et al. (2013) under the assumption that n is 



strictly positive, and this implies that ju is well-defined for every U C V. Here, it is worth 
remarking that, for every U C V, in Q is well-defined if and only if > because 
l^E > [J>u f° r every E C U . 



3.2 The general discrete case 

In this section, we consider the general case where the variables in Yy take on an arbitrary 
number of levels that we label as I v = {0 V , l v , . . . , d v }, for every v E V. Hence, the state 
space of Yy is the product set Ty = X v ^yl v that, with a slight abuse of terminology, we call a 
cross- classified table. Accordingly, the elements i = iy £ Ty are called the cells of the table. 
The probability distribution of Yy is characterized by the probability table w = (wi)i^x v , 
which we assume to be strictly positive. We remark that the symbol w is used to denote 
the probability table of an arbitrary discrete random vector whereas tt is used only in the 
binary case. 
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In the general discrete case, two different parameterizations of bi-directed graph mod- 



els are available. Lupparelli et al. (2009) showed that there exists a connection between 



bi-directed graph models and the multivariate logistic parameterization of Glonek and Mc- 



Cullagh (1995). Specifically, every multivariate logistic interaction is a log-linear parameter 



computed in the relevant marginal distribution and a bi-directed graph model can be spec- 
ified by setting to zero all the interactions corresponding to the disconnected sets of the 



graph; see also Rudas et al. (2010) and Evans and Richardson (2012). Parsimony can be 



achieved by setting further interactions to zero, but such additional constraints are typically 
difficult to interpret. 



Drton (2009) generalized the Mobius parameters to include non-binary variables and 



regression graph models. From the state space of Y V) Drton (2009) introduced the restricted 
state space defined as J v = I V \{0 V } = {l v , . . . , d v } so that the restricted state space of Yy is 
given by Jy = X v£ yJ v . Hereafter, we refer to "0,/' as to the baseline level of Y v and remark 
that the choice of the level to be set as baseline is arbitrary. For every U QV, with U ^ 0, 
we denote by Xy and Jy the state space and the restricted state space, respectively, of Yy. 
Furthermore, for every j G Jy we denote by jy the subset of levels of j corresponding to 
the entries of Yy, so that we can write jy C j, and it holds that Jy = {jy \ j G J}. Finally, 
when U = we use the convention that jy = Jy = 0. 



The saturated Mobius parameter of Yy was defined by Drton (2009) as the collection of 
marginal probabilities 



pr(Yy = jy) for every j G Jy and [/cy with U ^ 



(3) 



Drton (2009) showed the saturated Mobius parameters characterize the distribution of Yy 



and that every bi-directed graph model is defined by an appropriate choice of multiplicative 
constraints on the saturated Mobius parameters. However, it is not clear how parsimony can 
be achieved with this parameterization. We also recall that the saturated Mobius parameters 



are closely related to the dependence ratios of Ekholm et al. (2000). 



4 The LML parameterization for the general discrete case 



In this section we extend the LML parameterization of Roverato et al. (2013) to discrete 



random variables with arbitrary number of levels and show that, also in this case, every 
bi-directed graph model can be defined by setting certain LML interactions to zero. 

For every v G V and i v G I v we introduce the Bernoulli random variable X^ v defined as 



V 



1 if Y v = i v 
otherwise. 



(4) 



In this way, every cell i G Ty is associated with the random vector Xy = {X^ v ) v ^y which 
follows a multivariate Bernoulli distribution, and we denote by it 1 = (iry)ucv the corre- 
sponding probability parameter. Accordingly, the mean and LML parameter of Xy can be 
computed as [i % = Ztt 1 and 7* = M T log//, respectively. 
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It is straightforward to see, from ([TJ and Q, that 
(4j = pr(Xb = lu) = pr(Yu = iu) for every i G Ty and with U ^ 0, (5) 

and it follows that the Mobius parameter in (|3]) can be written as a collection of vectors 
of Mobius parameters for binary vectors; formally jj, 3 for j G J7^/. In this way one can see 
that the collection of LML parameters 7 J , for j G Jy, parameterizes the distribution of 
Yy, because there exists a bijective map between 7 J and p? for every j G Jy. For every 
i, i' G ly and J7 C V such that i;y = i'jj it holds both that //^ = and that 7^ = 7^ and, 
to remove redundancies, we write fi iu = fj,\j and j %u = lb so ^ na ^ Mobius and the LML 
parameters of Yy can be formally defined as 

V = (^ u )jej v ,ucv and 7 = (j JU )j£j v ,ucv, 

respectively. 

The fact that both the Mobius and the LML parameter for the general case are defined 
as the collection of Mobius and LML parameters, respectively, for the collection of binary 
vectors X y with j G Jy represents a key feature that confers useful properties to our 
approach. For instance, the generalization of relevant properties for these parameterizations 
follows immediately form the iterative application, for every j G Jy, of the corresponding 



properties of binary vectors. This is the case of the result of Roverato et al. (2013), given 
in Theorem 13.11 



Theorem 4.1. Let [i and 7 the Mobius and LML parameters of Yy, respectively. Then 
for a pair of disjoint, nonempty, proper subsets A and B of V , the following conditions are 
equivalent: 

(1) Y A ALY B ; 

(ii) fjA'uB* = ^a' x /Jb' for every j G Jy, A' C A and B' C B; 
(Hi) 7^'a'us' = for every j G Jy, A' C A and B' C B such that A' ^ and B' / 0. 
Proof. See the Appendix [X] □ 



Theorem 4.1 can be applied to show that bi-directed graph models can be parameter- 



ized by setting to zero the LML interactions corresponding to the disconnected sets of Q, 



generalizing in this way the result of Roverato et al. (2013) given in Theorem 3.2 



Corollary 4.2. Let 7 = {^ u )jej v ,ucv be the LML parameter ofYy and let Q = (V,E) be 
a bi-directed graph. The distribution of Yy satisfies the connected set Markov property with 
respect to Q if and only if for every set U C V that is disconnected in Q it holds that ^ 3u = 
for every j v G Ju- 

Proof. See the Appendix [A] □ 
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Example 4.1 [hi- directed 4-chain cont.). As in Example 3.1 we consider the graph Q 



in Figure [T] Then, by Corollary |4,2| the distribution of Yy satisfies the connected set 
Markov property with respect to Q if and only if it holds that 7 jc/ = for every U G 
{{1,3}, {1,4}, {2, 4}, {1,2, 4}, {1,3, 4}} and j v G J v , i.e., if and only iffW) = 7^,4} = 
y {2 , 4 } = 7 i{i,2,4} = y'{i,3,4} = for every j G Jy. Clearly, if Yy is binary then \ Jy\ = 1 and 
the zero LML interactions are the same as in Example |3.1| 



The discrete bi-directed graph model for Yy with graph Q = (V, E) is defined as the set of 
positive probability distributions for Yy that obey the connected set Markov property with 



respect to Q. Drton (2009 Corollary 10) showed that discrete bi-directed graph models are 
curved exponential families. The simple nature of the mapping w 1— > 7 allows one to see 
that 7 is a smooth parameterization of the saturated model and therefore that any model 
defined by imposing linear constraints on the LML parameter of the saturated model is 
a curved exponential family. The family of submodels defined by linear constraints on 7 



includes discrete bi-directed graph models by Corollary 4.2, and in the next section we will 



see how additional zero constraints on the LML parameters can be specified so as to obtain 
interpretable bi-directed graph submodels. 

Maximum likelihood estimation for LML models under a multinomial or Poisson sam- 
pling scheme is a constrained optimization problem, that can be carried out by using 



gradient-based ascent methods; we refer to Lang (1996) and Bergsma et al. (2009) for 



details. We remark that the likelihood function can be expressed in terms both of Mobius 
parameters and of LML parameters but not analytically in terms of multivariate logistic 
parameters because an analytic form of the inverse map to compute w from the marginal 



logistic parameters is not available; see also Roverato et al. (2013). 



5 Dichotomization invariance and S-expanded graphs 

The LML parameterization is based on the collection of binary vectors XL for j G J and 
it is useful to take a closer look at how these parameters are computed. For every i G ly 
the computation of [j? is based on the probability table tt' 1 of Xy and one should observe 
that tt 1 is obtained by collapsing the levels of Yy in such a way that the probability in the 
cell i G Xy is not affected by this operation; formally, ro, = tt v . For this reason, we say 
that 7r* is constructed by collapsing the levels of Yy "around" the cell i, and this trivially 
implies that the value of //, as well as that of 7*, is unaffected by collapsing operations that 
do not involve the level i v for every v £V. As a consequence of this invariance property of 
the LML parameterization, certain zero entries in the LML parameter 7 of Yy allow one to 
identify marginal independencies concerning dichotomized versions of the variables, as well 
as to identify levels of the variables that can be collapsed without affecting the structure 
of the associated bi-directed graph. We formally approach this issue by introducing the 
concept of binary expansion of a discrete variable, that is based on the dichotomization Q. 
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Definition 5.1. For v £ V, the binary expansion of Y v with respect to J v is the \J V \- 
dimensional random vector of binary variables Xj v = (Xl v )j v< zj v . 

The binary expansion Xj v provides an alternative representation of Y v : every entry of 
Xj v corresponds to a different dichotomization of Y v so that, for every j v 6 J v , the variable 
Xl v takes on the value 1 if and only if Y v = j v and value otherwise. Moreover, Xj v = 0j v 
if and only if Y v = V . Clearly, there exist \I V \ different binary expansions of Y v , depending 
on the choice of the baseline level, and they are all equivalent, in the sense that there is a 
one-to-one relationship between Y v and any of its binary expansions. On the other hand, the 
specification of one binary expansion, out of the \I V \ existing alternatives, amounts to fixing 
a particular perspective from which the variable structure is explored. A suitable choice of 
the baseline level may correspond to a binary expansion of special interest and, ultimately, 
make it possible to disclose relevant additional features concerning the association structure 
of the variables. 

Example 5.1 (Genetic association analysis cont.). Let Y v be the variable representing 
a given SNP under the codominant genotype model so that Y v takes values in the set 
{WM,WW,MM}. There exist three different binary expansions of Y v . However, if one 
sets the genotype WM as baseline level, then every entry of the resulting binary expansion 
Xj v has a clear interpretation. Indeed, one of the two entries is associated with WW and 
therefore it corresponds to the "dominant" dichotomization WM + MM vs. WW whereas 
the other entry is associated with MM and corresponds to the "recessive" dichotomization 
WM + WW vs. MM. In order to make our notation more intuitive, hereafter for a SNP Y v 
we label WW as D v (for Dominant), and MM as R v (for Recessive) so that J v = {D v , Ry} 
and X Jv = (X^,X^) T . 

We now turn to the variable relative to the phenotype or trait of interest. It is common 
for a genetic association study to be based on a case-control design where the phenotype Y v 
is a discrete variable with levels corresponding to the controls and to the different states of 
a given disease for the cases. Hence, by setting the controls as baseline level, every entry of 
the resulting binary expansion Xj v corresponds to one of the different states of the disease. 

The concept of binary expansion of a variable can then be extended to that of binary 
expansion of a random vector Yb, B C V, with respect to Jb = L) v ^bJv that is given by 
Xj B = (Xl v )j v( zj V)V( zB- In the rest of the paper, we assume, without loss of generality, 
that Jb is fixed and shortly write that Xj B is the binary expansion of Yb- Furthermore, if 
P = V\B then we write Yy = (Yp,Xj B ) and say that Yy is the B -expansion of Yy; note 
that Yy 7 = X Jv whereas, if B = then J B = and Yy = Yy. 

The main result of this section is a generalization of Theorem |4.2| where we show that, 
for every B C V, bi-directed graph models for Yy can be parameterized by setting certain 
entries of 7 to zero. We emphasize that here 7 is the LML parameter of Yy and therefore 
the subset B plays no role in its computation. 
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1-disease 2-SNP 1-disease 2-SNP 1-disease 2-SNP 

(a) (b) (c) 

Figure 2: Example of complete expanded graphs for variables Y\ and Yi with J% = 
{1,11,111} and J 2 = {D 2 ,R 2 }. 

The marginal independence of Yy = (Yp, Xj B ) is encoded by a graph on PUJb vertices. 
By construction, for every v £ V, no marginal independence is present between the entries 
of Xj v because Xy = 1 implies that Xj v \{j v y = Oj v \{j v ], for every j v £ J v . Hence, the 
bi-directed graph of Y B is a i?-expanded graph, formally defined below. 

Definition 5.2. let Yy be a discrete random variable and V = PL) B & partition of V. We 
say that Q B is a B-expanded graph for Yy if it is a bi-directed graph with vertex set equal 
to PL) Jb, and such that the subgraph induced by J v is complete for every v £ B. 

Example 5.2 (Genetic association analysis cont.). Consider a case-control genetic asso- 
ciation study where Y\ is a variable whose baseline level corresponds to the controls and 
Ji = {/, II, encodes three different stages of a disease measured on the cases. Fur- 
thermore, let Y2 be a SNP and J2 = {Z?2, ^2}- The graphs in Figure [2] represent: (a) 
the complete .B-expanded graph with B = 0, that is, the complete bi-directed graph with 
vertex set {1,2}, (b) the complete .B-expanded graph with B = {2} and (c) the complete 
.B-expanded graph with B = {1,2}. Note that, to improve readability, the gray color is used 
to represent the complete subgraphs of the expanded variables. 

Before stating the connected set Markov property on .B-expanded graphs it is convenient 
to introduce the notion of primary subsets. 

Definition 5.3. Let Jb = U ve BJv where B C V and P = V\B. We say that K is a 
primary subset of Jb if K C Jb and it contains at most one level for every variable in Yb', 
formally, \K D J v \ < 1 for every v G B. Furthermore, we say that L is a primary subset of 
P U Jb if L C P U Jb and K = L n Jb is a primary subset of Jb ■ 

Note that the empty set is always primary and, furthermore, if B = then P = V and, 
in this case, every subset of P is primary. 

Lemma 5.1. Every primary subset L of P U Jb can be partitioned as L = QlJK where 
Q = P n L and K = Jb H L. Moreover, there exists a unique subset D C B such that 
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K G Jd and, for this reason, we can write K = jo and L = Q U jo- Conversely, for every 
Q C P, D C B and jo G Jo it holds that Q U jo is a primary subset of P U Jb- 

Proof. This is a straightforward consequence of the fact that if L is primary and K = LPiJb 
then \K n J„| < 1 for every v € B. □ 

We can now state the main result of this section. 

Theorem 5.2. Let 7 = {^ v )j£j v ,uc.v be the LML parameter of Yy and let Q B = (P U 
Jb,E b ) be a B-expanded graph for Yy. The distribution ofY B = (Yp,Xj B ) satisfies the 
connected set Markov property with respect to Q B if and only if for every set L C P U Jb 
such that 

(i) L is disconnected in Q B , 

(ii) L is a primary subset of PL) Jb, 



it holds that ^Q UD = for every jq G Jq, where L = Q U jjj as in Lemma 5.1 and 
Jqud = JQ U jo- 
Proof. See the Appendix [X] □ 

The following example clarifies the connection between LML parameters and edges of 
expanded graphs in the simple case where only two variables are considered. Interestingly, 
when both variables are expanded, every edge of the expanded graph can be associated with 
exactly one LML parameter. 

Example 5.3 (Genetic association analysis cont.). Let Y\ and Y2 be the two variable 
depicted in the graphs of Figure |ij The LML parameter of ^{1,2} i s made up of: 7® = 0; 
the main effects of Y\ that are 7^, and 7^ 7/7 J" ; the main effects of Y2 given by ji D2 ~i 

and 7^ and, finally, the two-way interactions -yi 1 ' *}, 7 {- r/ >- D a} j ^{IH,D 2 }^ ^{l,R 2 }^ j{n,R2} 
and j{ ni < R z} . It follows from Corollary 4.2 that the edge 1 -<->• 2 in the graph (a) of Figure^ 



can be removed if and only if all of the six two-way LML interactions are equal to zero, 
so that Y"i_LLY2- Consider now the graph (b) in the same figure. Here, there are only two 
primary subsets involving more than one vertex, {1,1)2} an d {l,^}- The edge 1 f-> D2 
encodes the marginal association between Y\ and the dominant version Xf 2 of the SNP y 2 



and, by Theorem 5.2 it can be removed if and only if ^ I ' D ^ = ji 11 ^} — ^{111,02} _ 
Similarly, the edge 1 -H- R2 can be removed if and only if ^yx 1 ^} = yf-^i^M _ ^{lll,Ra} _ q 
It is therefore clear that Y\ is independent of the codominant representation Y2 of the SNP 
if and only if it is independent both of the dominant representation X^ 2 and the recessive 
representation X^ 2 ■ Of main interest is the case where only one edge is missing in the graph 
(b) because the i?-expanded provides additional insight into the independence structure of 
the two variables with respect to the traditional graph for Yfi^x. We now turn to the B- 
expanded graph (c) in Figure [2~] In this case, P = 0, Jb = {I, II, III, D2, R2} and the 



11 



primary subsets of Jb which involve more than one vertex are {J, D2}, {II, D2}, {HI, D2}, 



{I, R2], {II, R2] an d {HI, R%\- Hence, it follows from Theorem 5.2 that every edge of the 
graph can be removed if and only if the corresponding two way interaction is equal to zero. 
For instance, the edge II <-> R2 can be removed if and only if ji 11 '^} = 0. 

We now illustrate the potentiality of i?-expanded graphs to provide interpretable parsi- 
monious bi-directed graph submodels. 

Example 5.4 (Genetic association analysis cont.). Let V = {1,2,3} where Y\ and Y2 are 



the two variable in the graphs in Example 5.4 and I3 is an additional SNP. In this case, apart 
from 7® = 0, the LML parameter is made up of 35 entries, concretely: 7 main effects, 16 two- 
way interactions and 12 three-way interactions. Assume that the probability distribution 
of Yy is such that the following 18 LML parameters are equal to zero: jV 1 > R2 s > jxhiRsj^ 
7 {Z?2,£'3} ) 7 {«2,Z3 3 } ; 7 {£>2,R3} 5 7 Oi,il2,ii3} ) ^{ji,RaJh} an d ^{h^Rz) f or every j 1 G j x It is 



easy to see from Theorem 3.1 that in this case there are no pairwise marginal independencies 
so that the distribution of Yy is associated with the complete graph (a) in Figure |3j However, 
the distribution of Yy belongs to a parsimonious model that can be completely defined in 
terms of marginal independence relationships. If we set B = {2,3}, then the zero LML 
parameters above are associated with the subsets of {1} U Jb with which are both primary 



and disconnected in the graph (b) of Figure |3| Hence, by Theorem 5.2, the distribution 
of Yy satisfies the connected set Markov property with respect to the latter graph that 
implies, among others, the independence of the disease of the recessive versions of the SNPs; 
li_LL(X^ 2 , X^ 3 ). Clearly, it would be possible also to expand Y\, but this would make 
the graph unnecessarily more complex because the zero structure of 7 does not allow us to 
remove any edge with an endpoint in the expansion of this variable. However, if additional 
interactions are equal to zero such as, for instance, ji ni ' D2 } j{ n > D 3} ; ^{m,Ds,} ^ ^{lll,D 2 ,D 3 } 
and ry{H,D2,D3) ^ then it makes sense to expand also Y\ so as to obtain the i?-expanded graph 
(c) in Figure [3] This graph encodes marginal relationships involving single levels of Y\ , such 
as X[ n AL(X^ 2 , Xf 2 , Xf 3 , Zf 3 ) that is equivalent to X[ H ALY {2 3} . Note that, in the latter 
model, 23 out of the 35 LML parameters are constrained to zero but the independence 
structure of Yy is still represented by the complete graph. 

The rest of this section is devoted to some basic results that are required for the proof of 
Theorem |5.2| These are rather technical but, nevertheless, of interest because they provide 
a formal proof of the dichotomization invariance property of both the Mobius and the LML 
parameterizations. It is worth remarking that most of the existing results for marginal 



independence models, including Theorem 4.1 and Corollary |4,2[ cannot be directly applied 
to Yy because its distribution contains structural zeros. In fact, the following lemma shows 
that the Markov property for B-expanded graphs is characterized by the same property for 
the collection of subvectors (Yp, Xj B ), for jb £ Jb, whose probability distributions have no 
structural zeros. 
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1-disease 1-disease 
(a) (b) (c) 

Figure 3: Example of expanded graphs for Yn2,3} with J\ = {I, H , HI} anci Ji = {Di, Ri} 
for i = 2, 3. 

Lemma 5.3. Let Q B = (P U Jb,E b ) be a B-expanded graph for Yy. The distribution of 
Y B = (Yp,Xj B ) satisfies the connected set Markov property with respect to Q B if and only 
if the distribution of (Yp,Xj B ) satisfies the connected set Markov property with respect to 

Gpuj B f or ever V IB e \Jb- 

The probability table of Yp is made up of \X B \ positive probabilities that can be written 
as pr(Ye = ib) for every i B £ 1 B or, equivalently, as pr(Y D = jD,Y B \D = b \d) for 
every jp> 6 Jd an d D C B. We have named Xj B the binary expansion of Yb because its 
probability table can be hypothetically constructed by adding 2^ Jb ^ — \Ib\ structural-zero 
cells to the probability table of Y B . More specifically, the probability distribution of Xj B 
can be written as pr(X^ = 1k,Xj b \ k = Oj b \k)i f° r K C J B and, these probabilities are 
equal to zero if and only if K in non-primary. 

Lemma 5.4. Let Xj B be the binary expansion ofYs and let K C Jb- If K is a primary 



subset of Jb then we can write K = ju by Lemma 5.1 and it holds both that pr(X^ = 
1k,X Jb \ k = Jb \ k ) = pi(Y D = jD,Y B \D = Ob\d) and , if 3D + 0, that pr(X K = l K ) = 
pr(Yo = jo)- Furthermore, if K is non-primary then pr(Xx = 1k,Xj b \ k = Oj b \k) = 
pr(X K = l K ) = 0. 

Proof. See the Appendix [A] □ 

We turn now to Y B = (Yp, Xj B ). The Mobius and the LML parameters of Yy are 
/x = (n 3u )j£j Vt ucv anci 7 = (l' >u )j£Jv,UCVi respectively. Furthermore, we denote the 
Mobius and the LML parameters of Y B by ft and 7, respectively, where the entries of fi are 
^j'qUIk £ or ever y jg £ Jq anc [ j{ c Jp, and we use the shorthand jj,iQ ulK = p?3 . Similarly, 
we denote the entries of 7 by 7^? for j'q G Jq and K C Jb- The following theorem states 
that both the Mobius and the LML parameters are dichotomization invariant in the sense 
that /U and 7 are subvectors of fi and 7, respectively, whereas the remaining entries of fi and 
7 are equal to zero and not well-defined, respectively. 
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Theorem 5.5. Let Yy = (Yp,Xj B ) be the B-expansion ofYy. Furthermore, let ji3 and 
p?3 , for jq € Jq and K C Jb, the Mobius and the LML parameter of Y B , respectively, 
whereas [i and 7 are the Mobius and the LML parameter of Yy, respectively. If L is a 
primary subset of P U Jb, so that L = Q U jjj as in Lemma 



5.1 



then ^oud = ft" an( l 
ryjQuD — ^9^, for every jq € Jq. Conversely, if L is non-primary and we set K = L n Jb 
and Q = L(~) P, then fi^ = and, consequently, 7^? is not well-defined, for every jq E Jq. 

Proof. See the Appendix [A] □ 



6 Discussion 



Roverato et al. (20131 introduced the LML parameterization for binary data and explained 
its advantages with respect to both the multivariate logistic and the Mobius parameteriza- 
tion. The extension of this approach to variables with an arbitrary number of levels, has 
disclosed the additional invariance property that makes it possible to use the LML param- 
eter of Yy to characterize the connected set Markov property for any i?-expansion Yy of 
Yy. As a consequence, using the LML parameterization amounts to implicitly working with 
the binary expansions of variables or, from a different perspective, the LML parameteriza- 
tion allows one to deal with expanded variables implicitly, so as to avoid all the difficulties 
associated with the expansion operation. For instance, one has not to worry either for the 
presence of structural zeros or for the inefficiencies deriving from the artificial increase in 
dimensionality. In structural learning, the set B does not need to be defined a priori, but 
it can be specified after a LML model has been selected from data. For instance, B can 
be chosen so as to optimize the trade-off between readability of the graph and the need to 
explicit the learnt independencies involving every single expanded variable. 

Open questions include the specification of the baseline level in situations where there 
is no 'natural' baseline level, as well as the specification of the baseline level for ordinal 
variables. The general issue of developing model search strategies for the identification of 
a parsimonious bi-directed graph submodel is still open. However, it may be appealing to 
restrict the search space to the models characterized by the family of ^-expanded graphs, 
because it is made up of interpretable models, and it is smaller than the the family of all 
the models obtained by constraining a subset of the LML parameters to zero. 



We close this discussion by remarking that, by Theorem 5.5 also the Mobius parameteri- 
zation satisfies the same dichotomization invariance property as the LML parameterization. 
However, the LML parameterization has the advantage that marginal independence rela- 
tionships correspond to, linear, zero constraints in the space of the parameters. 
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A Proofs 

Proof of Theorem 14.11 

(i) =>• (ii). Firstly, we note that the factorizations in (ii) are trivially true when at least one 
between A' and B' is equal to the empty set because /i- 70 = 1. The independence Y^iLYg 
implies that Y A ,AY B > for every A' C A and B' C B with A',B' / 0. In turn, Y A >ALY B i 
implies that, for every j G Jy, pr(Y A , = j A ',Y B > = j B >) = pr(Y^/ = j A >) x pr(Y B / = j B >) 
and since Y A > = j A i if and only if X», = 1 A ', and similarly for Y B >, this implies that 
pr(X A , = 1 A /,X B , = l B i) = pr(X J A , = l A i) x pr(X J B , = l B i) and the result follows by (jij) 
because the latter factorization can be written as fpA'uB' = ^3a> x 



(i)^=(ii). By Theorem 3.1, (ii) implies that X A ^X B for every j £ Jy, and therefore that 
pr(X^ = l A ,X B = 1b) = pr(X^ = 1 A ) x pr(X B = 1 B ), for every j £ Jy. The latter 
factorization is equivalent to 

pr(Y A = j A ,Y B =j B ) = pr(YA = j A ) x pr(Ys = j B ) for every j G J v (6) 

and we have to show that the factorization in (pi) holds for every i 6 Xy. This follows 



immediately form Theorem 8 of Drton (2009) but we formally prove it for completeness. 



Every i 6 Xy with i Jy contains at least one level labeled as baseline, that is as "0", and 
the proof is by induction on the number baseline levels in i. Let k denote the number of 
baseline levels in i £ Xy; the factorization in ^ holds for k = and we show that if it is 
true for k = r — 1, with r > 0, then it is also true for k = r. Assume that k = r and that, 
without loss of generality, Y v = V with v £ A. Hence, if A' = A\{v }, 

pr(YA = i A , Y B = i B ) = pr(Y v = V , Y A i = i A >,Y B = i B ) 

d v 

= pi{Y A ' = i A ',Y B = i B ) - ^2 P r (X> = iv, Y A > = i A ',Y B = i B ) 

iv —■ lu 

and, since the number of "0"'s in i A '\j B is equal to r — 1, it follows from the induction 
assumption that 

d v 

pi{Y A = i A ,Y B = i B ) = pr(Y A/ = i A t) x pr(Ye = i B ) - ^ pr(y,, = i v ,Y A , = i A <) x pr(Ye 

i v — l^j 

r d v \ 

= I pr(^A' = U') - pr ( Yv = iv ' Y A' =i A')\ pr (Y B = i B ) 

v Iu = \d ) 

= px{Y A = i A ) x pr(Ye = i B ) 
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as required. 

(ii) 4=> (iii). This follows immediately from the equivalence, for every j E Jy, of (ii) and (iii) 
in Theorem 13.11 



Proof of Corollary |4.2 



As a consequence of Theorem 4 of Richardson (2003), we have to show that Ici-U- • • • -U-^C, 



for every disconnected set U of Q if and only if for every disconnected set U of Q it holds 
that 7 : " 7 = for every jjj € J\j. Recall that C±, . . . ,C r are the connected components of U 
and that r > 2. 

If Y Cl AL ■ ■ ■ ALY Cr then we can set A = C\ and B = C 2 U • • • U C r so that U = ACl B 
and Ya-U-Yb- The result follows by noticing that every jjj £ J\j can be written as ]a\jb so 
that j ju = -y jAuB = by Theorem 



4.1 



We now show the reverse implication, that is, that if 7 JC/ = for every jjj £ Jy such that 
f7 is disconnected in Q, then for every disconnected set U of Q it holds that Yc^-U. • • • _LLlc' r .- 
Let A and i? be defined as above. Then, for every A' C. A and B' C 5, with A',B'^$ the 
set .A'U.B' is disconnected in ^ so that, by assumption, ^a'ub' = Q for every ja'vjB' G Ja'viB'- 
This is equivalent to saying that j3a<ub> = Q for every j G J7y, A' C ^4 and B' O B such that 



A' ^ and B' ^ 0. The latter, by Theorem 4.1 implies that Ya_LLYb or, equivalently, that 



Yc 1 ALYc 2 u--uC r ■ The same procedure can then be applied, for every % = 1, . .. ,r — 1, with 
A = Ci and B = Ci+i U • • • U C r to show that Yc i ALYc i+1 u--uc r f° r every i = 1, . . . , j — 1 
which, in turn, implies that Ybi-LL ■ ■ ■ -LLl(7 r , as required. 

Proof of Lemma 15.31 

In this proof we repeatedly use the fact that if the distribution of a set of variables satisfies the 
connected set Markov property with respect to a given bi-directed graph then the marginal 
distribution of any subset of the variables satisfies the same Markov property with respect 
to the corresponding induced subgraph. This result follows immediately from the definition 
of the connected set Markov property because all the independence relationships implied by 
the subgraph are also encoded by the larger bi-directed graph. 

Assume that the distribution of (Yp , Xj B ) satisfies the connected set Markov property 
with respect to Q B . Hence, for every js € Jb-, the random vector (Yp,Xj B ) is a subvector of 
(Yp,Xj B ) and, therefore, its distribution satisfies the connected set Markov property with 
respect to the relevant induced subgraph Qpyj- B . Hence, it is sufficient to show the inverse 
implication. 

We denote by L C P U Jb an arbitrary disconnected set in Q B and by Ci, . . . , C r its 
connected components. Furthermore, we set Q = L D P, K = L n Jb and, to make the 
notation simpler, we write Z = Y B = (Yp, Xj B ). Hence, we have to show that if the 
distribution of (Yp,Xj B ) satisfies the connected set Markov property with respect to Qp u j g 
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for every jp G Jb then 

r 

V i{Z L = z L ) = J] pr(Z C; = z Ci ) for every z L G X Q x {0, (7) 
l=i 

For ICPuJgwe set L>* = {v \ v G .B and K f]J v ^ 0}. Note that, if L is primary we can 
write L = QUjo by Lemma [5. 1| and, in this case, D = D*. Furthermore, it is easy to check 
that \K\ > \D*\ with \K\ = \D*\ if and only if L is primary. The proof is by induction on 
the value (\K\ - \D*\). 

We first assume — \D*\) = 0, which includes the case where K = 0. If (\K | — |-D*|) = 
then L = Q U jjj is primary and there exists a G such that L = Q U jo Q P ^ 3b- 
Hence, the factorization |7]) follows from the fact that, by assumption, (Yp,Xj B ) satisfies 
the connected set Markov property with respect to Qp\jj B - 

We now show that if the result is true for (\K\ — \D*\) = k with k > then it is also 
true for (\K\ - \D*\) = k + 1. If (\K\ - \D*\) = k + 1 then (\K\ - \D*\) > so that L is 
non-primary and, in turn, this implies that there exists a v G B such that | J v D K\ > 2. 
Hence, we set J' v = J v f\K and remark that the subgraph induced by J' v is complete because 
Q B is a 5-expanded graph. This implies that J' v is contained in exactly one connected 
component of L and we assume, without loss of generality, that J' v C C\. In this way, Xji is 
a subvector of Zq 1 which, in turn, is a subvector of Z^. If zl G Xq x {0, 1}^ then there are 
three possible cases: (i) two or more variables in Xj^ take value 1, (ii) exactly one variable 
in Xji takes value 1 and (iii) all the variables in Xji take on the value 0. 

In the case (i) we can find two distinct elements j v ,j' v G J' v such that Xl v = 1 and 
X( v = 1. However, pr(X^ = l,xf = 1) = pr(Y„ = j v ,Y v = j' v ) = and this implies that 
both pr(Zf, = zl) = and pr(Zc x = zc x ) = so that the factorization (J7| is trivially true. 

In the case (ii) we can find two elements j v ,j' v G J' v such that Xl v = 1, and Xl v = 0. If 
K' = K\{j' v } then (\K'\ — \D*\) = k and, since j' v G C\, the connected components of V are 
obtained by taking the connected components of C[ = Ci\{j' v } together with C*2, . . . , C r . 
Hence, by the induction assumption 

r 

pi(Z L , =z L >) = pr(Zc- ; = z C i) Yl P*(Z Cl = z Cl ) (8) 

1=2 

where pr(Zp/ = zqi) in (|8j) further factorizes according to the connected components of 
C[. The result follows from (jsj) by noticing that Xi v = is implied by Xi v = 1 so that 
Y>r(Xi v = 1) = pr(X^" = 1, Xi"" = 0) and, more generally, both pr(Z^ = z£) = pt(Zl> = zu) 
and pr(^ c < 1 = z Cl ) = pr(Z c / = z^). 

In the case (iii) we can find two elements j v , j' v G J' v such that Xl v = and Xl v = and 
we can write both 

W {Z L = z L ) = V t{Z li =z v )- w{Z l < = Z L ',X£ = 1) (9) 

and 

pr(Z Cl = z Cl ) = pi(Z c[ = z c[ ) - W {Z C , = z c[ ,x£ = 1), (10) 
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where, as above, L' = L\{j' v } and C[ = Ci\{j' v }. By the induction assumption, pr(Zj/ = 
Zl>) in ^ factorizes as in The probability pr(Z/y = Zl',Xv v = 1) in (joj) belongs to the 
case (ii) above because here Xl v = and Xy = 1 so that 

r 

W {Z L , = z L > , X$> = 1) = pr(Z c , = z c > ,X£ = l)l[ W (Z Cl =z Cl ). (11) 

1=2 

Hence, by applying the factorizations (J8J) and (11) in ^ one obtains 

r 

pr(Z L = z L ) = {pr(Zc-/ =z C ' 1 )~ pr(Z / = z C [,X$ = 1)} JJpr(Z C; = z Cl ) 

1=2 



and then, by (10), pr(Z^ = z£) = 111=1 wi^Ci = z C t ) as required. 



Proof of Lemma 15.41 



Let K be a primary subset of In this case, by Lemma 5.1 there exists a unique subset 
D C 5 such that K = jz> £ Jd and, therefore, Xr- = Xj D = {Xl v )j v ^j D . In order to 
show that pr(X x = 1 k ,X Jb \ k = 0j B \ K ) = P*(Y D = 3d,Y b \ d = b \d) we notice that 
(i) J B = J B \ D UJ D and therefore also J B \jo = Jb\d^{Jd\jd) because j D C J D ; (ii) 
Xj D = lj D implies that Xj D \j D = 0j D \j D ', (hi) it follows from (Q that both Xj D = lj D iff 
Yd = jD, and Xj BKD = 0j BXD iff Y B \ D = b \d- Hence, by applying (i) to (iii) one obtains 

pi(X K = 1 k ,X Jb \ k = 0j B \ K ) = w{X jD = l jD ,Xj B \ jD = 0j B \ jD ) 

= P*( X jD = ho > X J B \D = Q Jb\D ' X JD\jD = Jd\Jd) 
= W{ X j D = 1 j D ^ X J B \D = °J B \ D ) 

= pr(Y D = j D ,Y B \ D = B \ D ), 

as required. The fact that for jjj ^ it holds that pr(Xx = ljf) = pr (Yb = Jd) follows 
from (iii) above, and this completes the proof of the first statement. We prove the second 
statement by noticing that, if K C J B is non- primary, then, if follows from the definition of 
primary subset that there exists at)£B such that \K D J v \ > 1. Consequently, there exists 
a pair j v ,j' v £ J v with j v ^ j' v such that j v ,j' v S K. In this case, qx(Xk = Ik) < pr(X^ = 
l,Xv v = 1) = pr(y i) = j v ,Y v = j' v ) = 0. This also implies that pr(X# = Ik, x j d \k = 
°J b \k) = because pv(X K = 1 k ,X Jd \ k = Jb \ k ) < pv(X K = l K ) = 0. 

A basic lemma 



This lemma will be used in the proof of Theorem 5.5 below. 



Lemma A.l. LetXj B be the binary expansion ofY B . Furthermore, let fi = {£ik)k<zj b and 
7 = {ik)k<zj b be the Mobius and the LML parameter of Xj B , respectively, whereas [i and 7 
are the Mobius and the LML parameter of Yy, respectively. If K is a primary subset of J B 



then we can write K = j^, by Lemma 5.1 and it holds that both [i jD = flj D and / y jD = 7^ 
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Furthermore, for every K C Jg that is non-primary it holds that fix = and, consequently, 
jK is n °t well-defined. 

Proof. For every K C Jb it holds by ([I]) that fix = V t {Xk = ^k)- Recall that, if K is a 
primary subset of Jb then, by Lemma |5.1| there exists a D C B and a jo G J7b such that 



X = Jd- Hence, if K = jd is a primary subset of J/) then it follows from Lemma 5.4 and 



5J) that /2j D = pr(Xj D = = pr(Yo = j'd) = Conversely, if K is non-primary then 



\xk = pr(Xfc; = Ik) = by Lemma 5.4 



Similarly, by definition, jk = ^2eck(~ 1)'^^' log fiE which is well-defined if and only 
if \ik > 0. Hence, if K is non-primary then the quantity jk is not well-defined because 
in this case fix = 0, as shown above. On the other hand, if K = jd is a primary subset 
of Jb then it is not difficult to check that (i) the set {E\E C jp} can be alternatively 
written in the form {jn = {jd)h\H C D}, (ii) if H C D then {jd)h = 3h £ Jh so that 
flj H = h 3h , and (iii) for every H C D it holds that = |.D|, |j#| = \H\ and j# C jp so 
that (joYiffl = \D\H\. By using points (i) to (iii) above one obtains 

7i. = ^ (-1)1^1 log /i £ 
= £ (-1)1^*1 logA . H 



HCD 
HCD 



^(-1)1^1 log ^ 
iff 



□ 



Proof of Theorem 15.51 

We first consider the case where L is a primary subset of P U so that L = Q U jD as in 



Lemma 5.1 If we take the binary expansion of Yy with respect to Jy = JpU Jb, that is 



equal to Xj v = (Xj p ,Xj B ), then it follows from Lemma A.l that 

^ QUD =H Q uo and l 1QUD =l 3Q uo (12) 
for every jq £ Jq. Recall that, since both the Mobius and the LML parameters satisfy 
the upward compatibility property then the computation of both pj QuD and Jj QUD can be 
carried out with respect to the distribution of (Xj Q , Xj D ) that is a subvector of (Xj Q , Xj D ) . 
It also follows from upward compatibility that, for every Jq 6 Jq, both p?^ and 7^ can 
be computed with respect to the distribution of (Yq, Xj D ). One should notice that Xj D 
contains exactly one entry for every variable in Yd and, as a consequence, the probability 
table of (YQ,Xj D ) is strictly positive because it can be obtained by collapsing some levels 
of the probability table of Yqud, that is assumed to be strictly positive. For this reason it 



makes sense to consider (YQ,Xj D ) in place of Yy in Lemma A.l to show that 

tfjn = Hqud and lf D = IjQUD (13) 
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where the quantities fj-j QuD and ; yj QuD in (13) coincide with same quantities in (12) because 



they are computed with respect to the binary expansion (Xj Q ,Xj D ) of (Yq,Xj d ). Hence, 



(12|) and (|13[) lead to ^Q UD = and ^Q UD = 7^ as required. 



Consider now the case where L is non-primary. If we set K = L Pi Jb and Q = L D P, 



then, for every jq E J7q it follows form definition of Mobius parameter and Lemma 5.4 



that ftf? = pr(Y Q = jQ,X K = l K ) < pr(X^ = 1 K ) = 0. This implies that 7^ is not 
well-defined because its computation involves the logarithm of fl ? = 0. 



Proof of Theorem 15.21 



We start this proof by remarking that, by Lemma 5.1 a subsets L C P U Jb is such that 



L C P U jg for some jg E Jb if an d only if it is a primary subset of P U Jb- This also 



implies that if L C P U jb we can write L = Q U jp>, as in Lemma 5.1 



By Lemma 5.3 the distribution of Y B = (Yp,Xj B ) satisfies the connected set Markov 



JB ) 

property with respect to Q B if and only if the distribution of (Yp, Xj B ) satisfies the connected 

IB 



set Markov property with respect to Gp\jj B for every jg E Jb- 

For every jb E Jb-, the random vector Xj B contains exactly one entry for every variable in 
Yb and, as a consequence, the probability table of (Yp, Xj B ) is strictly positive because it can 
be obtained by collapsing some levels of the probability table of Yy, that is strictly positive 



by assumption. Hence, we can apply Corollary 3.2 and it follows that the distribution of 
(Yp,Xj B ) satisfies the connected set Markov property with respect to Qp\jj B if an d only 
if for every set L C P U jb that is disconnected in Gp U j B it holds that 7^ = for every 
jq E Jq. Note that it is possible that either Q = or D = 0. We can thus state that: 

The distribution of Y B = (Yp, Xj B ) satisfies the connected set Markov property 
with respect to Q B if and only if for every jp E Jb and every set L C P U js 
that is disconnected in G B U j B it holds that 7^ = for every jq E Jq. 

Notice that: (a) a subsets L C PL) Jb is such that L C PUjs for some jg E Jb if and only 
if it is a primary subset of P 1 U Jp, (b) L C P U Jb is disconnected in Qp\jj B if and only if 



it is disconnected in £/ s and (c) 7J = ^q^d by Corollary 5.5 Hence, by using (a), (b) and 



'3d 

(c) the statement above can be rephrased as: 

The distribution of Y B = (Yp, Xj B ) satisfies the connected set Markov property 
with respect to Q B if and only if for every set L C PVJjp that is a primary subset 
of P U Jb and disconnected in Q B it holds that ^Q UD = for every jq E Jq, 

and this completes the proof. 
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